Optimal Checkpointing Strategies for Iterative Applications
نویسندگان
چکیده
This work provides an optimal checkpointing strategy to protect iterative applications from fail-stop errors. We consider a general framework, where the application repeats same execution pattern by executing consecutive iterations, and each iteration is composed of several tasks. These tasks have different lengths checkpoint costs. Assume that there are n task i , 0 ? i <; has time t cost c . A naive would after task. Another at end iteration. inspired Young/Daly formula for ?{2 ?c xmlns:xlink="http://www.w3.org/1999/xlink">ave } seconds, ? MTBF average time, current (and repeat). strategy, also formula, select xmlns:xlink="http://www.w3.org/1999/xlink">min with smallest every p th instance task, leading period T, T = ? xmlns:xlink="http://www.w3.org/1999/xlink">i=0 xmlns:xlink="http://www.w3.org/1999/xlink">n-1 per One choose so ? obey formula. All these strategies suboptimal. Our main contribution show globally periodic, design dynamic programming algorithm computes pattern. may well many tasks, this across iterations. through simulations, both synthetic real-life scenarios, outperforms strategies.
منابع مشابه
Optimal Sampling Strategies for Oceanic Applications
We have developed a method for optimal array design and applied it to a suite of applications, including the design of a surface mooring array in the tropical Indian Ocean (Sakov and Oke 2007). The method builds on the work of Bishop et al. (2001), using data assimilation theory to determine the observation locations that best constrain a data assimilating ocean model. The method seeks to ident...
متن کاملMultigrid and Iterative Strategies for Optimal Control Problems
In this minisymposium we focus on optimal control problems, which constitute an important class of PDE-constrained optimization problems. There are many PDEs which can act as the constraints within the problem, such as Stokes-type equations, PDEs with a time-dependent component, and many others – consequently there is considerable potential for applications in applied sciences. One of the major...
متن کاملCheckpointing Strategies for Scheduling Computational Workflows
We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and deciding for each task wheth...
متن کاملCheckpointing Strategies for Scheduling Computational Workflows Guillaume
We study the scheduling of computational workflows on compute resources that experience exponentially distributed failures. When a failure occurs, rollback and recovery is used to resume the execution from the last checkpointed state. The scheduling problem is to minimize the expected execution time by deciding in which order to execute the tasks in the workflow and deciding for each task wheth...
متن کاملCheckpointing and Its Applications
This paper describes our experience with the implementation and applications of the Unix checkpointing library libckp, and identifies two concepts that have proven to be the key to making checkpointing a powerful tool. First, including all persistent state, i.e., user files, as part of the process state that can be checkpointed and recovered provides a truly transparent and consistent rollback....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Parallel and Distributed Systems
سال: 2022
ISSN: ['1045-9219', '1558-2183', '2161-9883']
DOI: https://doi.org/10.1109/tpds.2021.3099440